data synthesis
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.05)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (3 more...)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.46)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
Diffusion models have demonstrated highly-expressive generative capabilities in vision and NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are also powerful in modeling complex policies or trajectories in offline datasets. However, these works have been limited to single-task settings where a generalist agent capable of addressing multi-task predicaments is absent. In this paper, we aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution. Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > Singapore (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Banking & Finance (1.00)
Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis
Chen, Kai, Gong, Chen, Wang, Tianhao
In differentially private (DP) tabular data synthesis, the consensus is that statistical models are better than neural network (NN)-based methods. However, we argue that this conclusion is incomplete and overlooks the challenge of densely correlated datasets, where intricate dependencies can overwhelm statistical models. In such complex scenarios, neural networks are more suitable due to their capacity to fit complex distributions by learning directly from samples. Despite this potential, existing NN-based algorithms still suffer from significant limitations. We therefore propose MargNet, incorporating successful algorithmic designs of statistical models into neural networks. MargNet applies an adaptive marginal selection strategy and trains the neural networks to generate data that conforms to the selected marginals. On sparsely correlated datasets, our approach achieves utility close to the best statistical method while offering an average 7$\times$ speedup over it. More importantly, on densely correlated datasets, MargNet establishes a new state-of-the-art, reducing fidelity error by up to 26\% compared to the previous best. We release our code on GitHub.\footnote{https://github.com/KaiChen9909/margnet}
Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK
Krchova, Ivona, Vieyra, Mariana Vargas, Scriminaci, Mario, Sidorenko, Andrey
Abstract--Machine learning development critically depends on access to high-quality data. However, increasing restrictions due to privacy, proprietary interests, and ethical concerns have created significant barriers to data accessibility. Synthetic data offers a viable solution by enabling safe, broad data usage without compromising sensitive information. This paper presents the MOSTL Y AI Synthetic Data Software Development Kit (SDK), an open-source toolkit designed specifically for synthesizing high-quality tabular data. The SDK integrates robust features such as differential privacy guarantees, fairness-aware data generation, and automated quality assurance into a flexible and accessible Python interface. Leveraging the T abularARGN autoregressive framework, the SDK supports diverse data types and complex multi-table and sequential datasets, delivering competitive performance with notable improvements in speed and usability. Currently deployed both as a cloud service and locally installable software, the SDK has seen rapid adoption, highlighting its practicality in addressing real-world data bottlenecks and promoting widespread data democratization. HE development of Machine Learning applications requires broad access to training data. This necessity has become more critical in recent years with the advent of Deep Learning, which requires large-scale datasets to effectively train models.
Non-Rival Data as Rival Products: An Encapsulation-Forging Approach for Data Synthesis
Wang, Kaidong, Li, Jiale, Lin, Shao-Bo, Wang, Yao
The non-rival nature of data creates a dilemma for firms: sharing data unlocks value but risks eroding competitive advantage. Existing data synthesis methods often exacerbate this problem by creating data with symmetric utility, allowing any party to extract its value. This paper introduces the Encapsulation-Forging (EnFo) framework, a novel approach to generate rival synthetic data with asymmetric utility. EnFo operates in two stages: it first encapsulates predictive knowledge from the original data into a designated ``key'' model, and then forges a synthetic dataset by optimizing the data to intentionally overfit this key model. This process transforms non-rival data into a rival product, ensuring its value is accessible only to the intended model, thereby preventing unauthorized use and preserving the data owner's competitive edge. Our framework demonstrates remarkable sample efficiency, matching the original data's performance with a fraction of its size, while providing robust privacy protection and resistance to misuse. EnFo offers a practical solution for firms to collaborate strategically without compromising their core analytical advantage.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > California (0.04)
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)
- Marketing (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI
Sang, Jitao, Xiao, Jinlin, Han, Jiarun, Chen, Jilin, Chen, Xiaoyi, Wei, Shuyu, Sun, Yongjie, Wang, Yuhang
The rapid evolution of agentic AI marks a new phase in artificial intelligence, where Large Language Models (LLMs) no longer merely respond but act, reason, and adapt. This survey traces the paradigm shift in building agentic AI: from Pipeline-based systems, where planning, tool use, and memory are orchestrated by external logic, to the emerging Model-native paradigm, where these capabilities are internalized within the model's parameters. We first position Reinforcement Learning (RL) as the algorithmic engine enabling this paradigm shift. By reframing learning from imitating static data to outcome-driven exploration, RL underpins a unified solution of LLM + RL + Task across language, vision and embodied domains. Building on this, the survey systematically reviews how each capability -- Planning, Tool use, and Memory -- has evolved from externally scripted modules to end-to-end learned behaviors. Furthermore, it examines how this paradigm shift has reshaped major agent applications, specifically the Deep Research agent emphasizing long-horizon reasoning and the GUI agent emphasizing embodied interaction. We conclude by discussing the continued internalization of agentic capabilities like Multi-agent collaboration and Reflection, alongside the evolving roles of the system and model layers in future agentic AI. Together, these developments outline a coherent trajectory toward model-native agentic AI as an integrated learning and interaction framework, marking the transition from constructing systems that apply intelligence to developing models that grow intelligence through experience.
- Workflow (1.00)
- Overview (1.00)
- Research Report > New Finding (0.45)
- Instructional Material > Course Syllabus & Notes (0.45)
- Education (1.00)
- Health & Medicine (0.92)
- Leisure & Entertainment > Games (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (5 more...)